The Challenge of Small-Scale Repeats for Indel Discovery
نویسندگان
چکیده
Repetitive sequences are abundant in the human genome. Different classes of repetitive DNA sequences, including simple repeats, tandem repeats, segmental duplications, interspersed repeats, and other elements, collectively span more than 50% of the genome. Because repeat sequences occur in the genome at different scales they can cause various types of sequence analysis errors, including in alignment, de novo assembly, and annotation, among others. This mini-review highlights the challenges introduced by small-scale repeat sequences, especially near-identical tandem or closely located repeats and short tandem repeats, for discovering DNA insertion and deletion (indel) mutations from next-generation sequencing data. We also discuss the de Bruijn graph sequence assembly paradigm that is emerging as the most popular and promising approach for detecting indels. The human exome is taken as an example and highlights how these repetitive elements can obscure or introduce errors while detecting these types of mutations.
منابع مشابه
Small insertions are more deleterious than small deletions in human genomes.
Although lines of evidence suggest that small insertions and deletions differ in their mechanisms of formation, there remains the debate on whether natural selection acts differently on the two indel types. Currently available personal genomes and the 1000 Genomes Project permit population level and genome scale comparison of the selection regimes on the two indel types. We first developed a st...
متن کاملAccurate detection of de novo and transmitted indels within exome-capture data using micro-assembly
We present a new open-source algorithm, Scalpel, for sensitive and specific discovery of INDELs in exome-capture data. By combining the power of mapping and assembly, Scalpel carefully searches the de Bruijn graph for sequence paths that span each exon. A detailed repeat analysis coupled with a self-tuning k-mer strategy allows Scalpel to outperform other state-of-the-art approaches for INDEL d...
متن کاملCicArVarDB: SNP and InDel database for advancing genetics research and breeding applications in chickpea
Molecular markers are valuable tools for breeders to help accelerate crop improvement. High throughput sequencing technologies facilitate the discovery of large-scale variations such as single nucleotide polymorphisms (SNPs) and simple sequence repeats (SSRs). Sequencing of chickpea genome along with re-sequencing of several chickpea lines has enabled the discovery of 4.4 million variations inc...
متن کاملIndel seeds for homology search
We are interested in detecting homologous genomic DNA sequences with the goal of locating approximate inverted, interspersed, and tandem repeats. Standard search techniques start by detecting small matching parts, called seeds, between a query sequence and database sequences. Contiguous seed models have existed for many years. Recently, spaced seeds were shown to be more sensitive than contiguo...
متن کاملEvolution of Protein Domain Repeats in Metazoa
Repeats are ubiquitous elements of proteins and they play important roles for cellular function and during evolution. Repeats are, however, also notoriously difficult to capture computationally and large scale studies so far had difficulties in linking genetic causes, structural properties and evolutionary trajectories of protein repeats. Here we apply recently developed methods for repeat dete...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 3 شماره
صفحات -
تاریخ انتشار 2015